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Abstract 

This paper presents two cases of random banking data generators 
based on migration matrices and scoring rules. The banking data 
generator is a new hope in researches of finding the proving method 
of comparisons of various credit scoring techniques. There is analyzed 
the influence of one cyclic macro-economic variable on stability in 
the time account and client characteristics. Data are very useful for 
various analyses to understand in the better way the complexity of the 
banking processes and also for students and their researches. There are 
presented very interesting conclusions for crisis behavior, namely that 
if a crisis is impacted by many factors, both customer characteristics: 
application and behavioral; then there is very difficult to indicate these 
factors in the typical scoring analysis and the crisis is everywhere, in 
every kind of risk reports. 
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1 Introduction 



Currently predictive models and especially credit scoring models are very 
popular in management of banking processes [1] . It is a typical that risk score- 
cards are always used in credit acceptance process to optimize and control the 
risk. Various forms of behavioral scorecards are also used for management 
of repeat business and also for PD models in Basel RWA (Risk Weighted 
Assets) calculation [2]. It is a kind of phenomenon that a list of about 10 
account or client characteristics can predict their future behavior, their style 
of payments and their delinquency. 

One can say the trivial fact scorecards are useful and methodology is well 
known, but on the other hand still credit scoring can be developed and new 
techniques should be tested. The main problem today is that there is not 
defined the general testing idea of new methods and techniques, there is no 
proving method of their correctness. Many good articles are prepared based 
on one particular case study, on one example of real data coming from one or 
a few banks [3], [1] and [5]. From a theoretical point of view, even there are 
presented good results and very correct arguments to suggest choosing one 
method than another, it is the prove only on that particular data is indicated 
the difference, but nobody can prove it for other data, nobody can guarantee 
the correctness for all cases. 

There are also other important reasons why real banking data are not 
available globally and cannot be used by everyone analysts, like legal con- 
straints or too fresh new products with too short data history. These two 
factors suggest finding a quite another approach for predictive modeling test- 
ing in banking usage. 

It is a very good idea to start developing two parallel ways: real data and 
random-simulated data approaches. The second one even cannot replace real 
data it can be very useful to understand in the better way relations among 
various factors in data, to imagine a complexity of the process and can be a 
trial to create more general class of semi-real data. 

Let be considered some advantages of randomly generated data: 

1. Today many analysts try to understand and to analyze the last crisis [0] , 
among other things they develop methods of indicating risk stable in the 
time sub-portfolios. Topic is not easy and cannot be solved by typical 
predictive models based on target variable like in the case of default 
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risk. The notion of stability cannot be defined for every particular 
account or client, one cannot say that account is stable, only the set of 
accounts can be tested, so that technique should be developed by quite 
different method than typical predictive modeling with target variable. 
It can be formulated by a simple conclusion: the more accounts the 
more robust stability testing. In the random data generator can be 
tested various scenario to see and to better understand the problem. 

2. Scoring Challenges or Scoring Olympic Games. From time to time there 
are organized by different environments contests to find good modelers 
or to test new techniques. Sometimes data are taken form too real 
case. Too real means, that some real processes are not predictable, 
because they are influenced by many immeasurable factors. Even if 
scoring models are used in practice also in these cases it is not a good 
idea to use that data for contest. The best solution and the best fairly 
is to use random data generator process directly predictable. 

3. Reject inference area [1]. Still that topic needs development. Random 
data can be generated also for, in the reality, rejected cases for testing, 
so it can be used for better estimation of risk on blank areas and better 
experience. 

4. Today there are two or more techniques of scorecard building [7]. It 
needs to make some comparisons, to make some analysis to define rec- 
ommendations: where and what conditions suggest to use one than 
another method. The same case can be applied for different variable 
selection methods. 

5. Product profitability, bad debts and cut-offs. On random data all men- 
tioned notions can be tested and analysts experience can be broadened. 

6. Random data can also be very important factor in the topic of data 
standardization or the idea of auditing. Let imagine that there are 
prepared all ready run software tools for MIS (Management Informa- 
tion Systems) and KPI (Key Performance Indicators) reporting on the 
generic data structure firstly uploaded by random data. Then auditing 
of all another data will be minimized by only the upload data process. 

Simulation data are used in many areas, for example it is very useful in 
research of telecommunication network by the system like OPNET [S]. Also 
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there are developed simulated data in the banking area by [S] and [TU] . 

The simplest retail consumer finance portfolio is the fixed installment 
loan portfolio. Here process can be simplified by the following assumptions: 

• for all accounts one due date in the middle of the month is defined 
(every 15th), 

• every client has only one credit, 

• client can pay whole one installment, a few installments or pay nothing, 
two events only: payment or missing payment, 

• there are measured delinquency on state: end of month by indicated 
the number of due installments, 

• all customer and account properties are randomly generated by defined 
proper random distributions, 

• if the number of due installments attain 7 (180 past due days) the 
process is stopped and account is marked by bad account status, next 
collection steps are omitted, 

• if number of paid installments attains the number of all installments 
then the process is stopped and account is marked by closed account 
status, 

• payments or missing payments are determined by three factors: score 
calculated on account characteristics, migration matrix and adjustment 
of that matrix by one cycle time macroeconomic variable, 

• score is calculated for every due installments group separately. In more 
general case there can be defined different score for every status: due 
installments 0, 1, and 6. 

It is a good circumstance to emphasize that risk management today has 
very good tools for risk control, even if the crisis has come and was not 
predicted in the correct way, it could be indicated very quickly. It seems 
that the best of risk control tools is the migration matrix reporting. 

The goal of that paper can be also formulated in the following way: to 
create random data with the condition to obtain the same results like ob- 
served in the reality by typical reporting like migration matrix, fiow-rates or 
roll-rates and vintage or default rates. 
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2 Detailed description of data generator 



2.1 The main options 

All data are generated from starting date to ending Tg. 

The migration matrix Mij (transition matrix) is defined as a percent of 
transition after one month from due installments i to due installments j. 

There is one macro-economic variable dependent only on a time by the 
formula: E[m), where m is a number of month from Tg. It should satisfy the 
simple condition: 0.01 < E{m) < 0.9, because it is used as an adjustment 
of migration matrix, so it influences on the risk; in some months produces 
slightly greater one and in some months lower. 

2.2 Production dataset 

The first dataset contains all applications with all available customer char- 
acteristics and credit properties. 

Customer characteristics (application data): 

• Birthday - Tsirth ^ with the distribution D Birth 

• Income - - Dincame 

• Spending - Xl^^^^i^g - D spending 

• Four nominal characteristics - x%^_^ x%^^ - DNom^ , DNom2 > • • • > DNonn > 
in practice they can represent variables like: job category, marital sta- 
tus, home status, education level, or others. 

• Four interval characteristics - x'}^^^, ...,x'}^^^ - Dj^t^, Djnt2, Djnu, 
represent variables like: job seniority, personal account seniority, num- 
ber of households, housing spending or others. 

Credit properties (loan data): 

• Installment amount - Xj^^^ - with the distribution Djnst 

• Number of installments - xL. - Dn , 
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• Loan amount - x^Amount = ^Lst ' 

• Date of application (year, month) - Tapp 

• Id of application 

The number of rows per month is generated based on the distribution 

D Applications • 

2.3 Transaction dataset 

Every row contains the following information (transaction data): 

• Id of application 

• Date of application (year, month) - T^pp 

• Current month - T^ur 

• Number of due installments (number of missing payments) - a;^^^^ 

• Number of paid installments - xt ^ 

^ '''paid 

• Status - xl^g^f^g - Active (A) - is still not paid. Closed (C) is paid, or 
Bad (B) - when xt =7 

• Pay days - x^^^yg - number of days from the interval [—15, 15] before or 
after due date in a current month when payment was done, if there is 
missing payment, then pay days are also missing. 

2.4 Inserting the Production dataset into the Trans- 
action dataset 

Every month of the Production dataset updates the Transaction dataset with 
the following formulas: 

Tcur = Tapp, ^ridue ~ ^' ''"ripaid ~ ^' •'^status ~ -^days ~ ^■ 

It is the process of inserting starting points of new accounts. 
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2.5 Analytical Base Table - ABT dataset 



History of payments for every account is dependent on behavioral data, on 
behavior of previous payments. It is, of course, the assumption of that data 
generator. 

There are many ideas of behavioral characteristics creation. There are 
presented the simple methods to consider the last available states and to 
indicate their evaluations in the time. All data are prepared in ABT datasets, 
the notion Analytical Base Table is used by SAS Credit Scoring Solution [TT] . 

Let set current date Tcur as a fixed value. Actual states are calculated for 
that date by the formulas (actual data): 
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where years{) calculates the difference between two dates in years. 
Let consider two time series of pay days and due installments for the last 
11 months from fixed current date by the formulas: 



•'^daysi^) ~ ■^daysi'^cur ^) , 

xl^^ (m) = x°^^ (Tcur - m), 

ndue ^ I "due ^ '-"'^ ' ' 

where m = 0, 1, 11. 

The characteristics indicated the evaluation in the time can be calculated 
by the formulas: 

If every elements of time series for the last t-months are available then 
(behavioral data): 

41 W= (E^„-io<lM)A, 
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where t = 3, 6, 9, 12. 

If not all elements of time series are available then (missing imputation 
formulas) : 



•^days (^) 



15 



2. 



(2.1) 



In other words behavioral variables represent average states for last 3, 6, 9 
or 12 months. Without any problem user can add many other variables by 
replacing average statistic by another like MAX, MIN or other. 

2.6 Migration matrix adjustment 

Macro-economic variable E{m) influenses on the migration matrix by the 
formula: 



2.7 Iteration step 

That step is running to generate next month of transactions, from T^ur to 
Tcur + l. In every month some accounts are new, then the Transaction dataset 
is only updated by the ideas described in the subsection 12. 4[ Some accounts 
change the status by the formula: 



and these accounts are not continued in next months. 

For other active accounts in the next month there are generated events: 
payment or missing payment. It is based on two scorings: 
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where t = 3,6, 9, 12, a — Income, Spending, Nonii, Nom^, Inti, .... Int^, 
7 = Inst, Ninst, Amount, rj = days, ndue, ^ = days, n^aid, ^due, utl, dueutl, 
age, capacity, dueinc, loaninc, seniority, e and e are taken from the stan- 
dardized normal distribution N . 

Let consider the following migration matrix: 



_^oct ^ f Mif' when Scorccyde < Cutoff, 
I Mij when Scorccyde > Cutoff, 



where Cutoff is another parameter like all f3s and (f)s. 

For fixed Tf,,,^ and fixed a;"'^* = i all active accounts can be segmented 

"-due ° 

by ScorcMain to satisfy the same proportions like appropriate elements of 
migration matrix M^"'^*: the first group = by the highest scores has share 
equaled to M^q^, the second g — 1 has share M^{*, and the last group 
g = 7 share - M^F*. 

For particular account assigned to the group g payment is done in month 
Tcur + 1 when < i, in other case payment is missing. 

For missing payment Transaction dataset is updated by the following 
information: 

j,t _ ^act 



■ue 



^days = Missing. 



For payment by formulas: 

and Xd^yg are generated from the distribution D^ays- 

Described steps are repeated for all months between Tg and Te 



2.8 Default definition 

The Default is a typical credit scoring and Basel II notion. Every account 
from the observation point Tc„r is tested during the outcome period equals 
3, 6, 9 and 12 months. During that time there is analyzed maximal number 
of due installments, exactly: 

MAX = MkX'-Ux'i:l^{T,^r + m)). 
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where t = 3, 6, 9, 12. Dependently on value MAX are defined three values of 
default statuses Defaultt: 

Good: When MAX < 1 or during the outcome period was x'^^^^^ = C. 

Bad: When MAX > 3 or during the outcome period = B. In the 

case t = 3 when MAX > 2. 

Indeterminate: for other cases. 

Existing of Indeterminate status can be questionable. In some analysis 
only two statuses are preferable, for example in Basel II. It is also a good 
topic for father research which can be solved due to data generator described 
in this paper. 



2.9 Portfolio segmentation and risk measures 

Typically credit scoring is used for the control of the following sub-portfolios 
or processes: 

Acceptance process — APP portfolio: It is the set of all starting 
points of credits, where it is decided which one are accepted or rejected. 
Acceptance sub-portfolio is defined as the set of rows of Transaction dataset 
with the condition: Tcur = Tapp. Every account belongs to that set only ones. 

Cross— up sell process — BEH portfolio: It is the set of all accounts 
with the longer history than 2 months and in the good condition (without 
delinquency). Cross-up sell or Behavioral sub-portfolio is defined as the 
set of rows of Transaction dataset with the condition: x'^f^^^^^^y > 2 and 
■^"■dte ~ 0- Every account can belongs to that set many times. 

Collection process — COL portfolio: It is the set of all accounts with 
the delinquency, but at the beginning of the collection process. Collection 
sub-portfolio is defined as the set of rows of Transaction dataset with the 
condition: x^^* = 1. Every account can belongs to that set many times. 

For every mentioned sub-portfolio one can calculates and tests risk mea- 
sures called bad rates defined as the share of Bad statuses for every obser- 
vation points and outcome periods. 

Definitions of mentioned sub-portfolios in the reality can be more com- 
plex, here are suggested the simplest versions for father analysis of cases 
studies presented in the section HI 
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3 General theory 



3.1 The main assumption and definition 
Definition. The layout 

(t), 0^5 00) ^; D Birth, -Dq, -D-j., D Applications, Ddays, Cutoff) 

with the all rules and symbols, relations and processes described in the section 
12] is called The Retail Consumer Finance Data Generator in the case 
of fixed installment loans with the nick name RCFDG. 

Theorem — assumption. Every consumer finance portfolio with the 
fixed installment loans can be estimated by the RCFDG. 

The proof of that theorem can be always done in the correct way due to 
parts: j^rS and 0^6 in the formulas 12.21 and 12.31 From the empirical point of 
view credit scoring is always used in portfolio control, so mentioned theorem 
is correct, but problem is with the goodness of fit. Up to now theory is too 
early to define a good measures of fit, however it is a proper starting point in 
the next development of the general theory of consumer finance portfolios. 

The similar ideas and researches are presented in [5]. 

3.2 Open questions 

The next steps probably would be concentrated on: 

• Finding the correct goodness of fit statistics measuring the distance be- 
tween the real consumer finance portfolio and RCFDG. Also it should 
be tested the property of that statistics. 

• Analyzing the additional constraints to satisfy for example properties 
like: the predictive power, measured for example by Gini [12], of char- 
acteristic x'f^ygi?)) on Defaults should be equaled to 40%. 

• Creating more general case with all collection processes, more than one 
credit per customer, more than one macro-economic factors and other 
detailed issues. 
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• Analyzing of various existing real consumer finance portfolios and find- 
ing the set of parameters describing each of them. Then there can 
be developed the theory of principal component analysis (PCA) of all 
consumer finance portfolios in the particular country or in the world. 

• Defining the generalization of the notion of consumer finance portfolio 
contains almost all properties of real portfolios. 

• Using that generalized notion in researches on the development of scor- 
ing methods to use that notion as a general idea of method proving. 
For example the theorem: Scoring models build on Default^ and on 
Defaulti2 produce the same results could be solved by the additional 
condition: betas ior t — 3 and for t — 12 should be similar. It is very 
probable that many future researches will discover many properties and 
relations among betas, coefficients of the migration matrix and their 
consequences. 



4 Two case studies 



4.1 Common parameters 

All random numbers are based on two typical random generators: uniform 
U and standardized normal N distributions, in details: the distribution U 
returns a number from the interval (0, 1) with the equal probability. 

All common coefficients are the following: Tg — 1970.01 (January 1970), 
Te = 1976.12 (December 1976), 
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E{m) = 0.01 + (1.5 + sin((5 • vr • m)/{T, - T,)) + iV/5)/8, DAppUcations = 
300 • 30 • (1 + A^/20), if Tapp is December then D Applications = D Applications ■ 1-2. 
To define D Birth first is defined distribution of age: -D^ge = ((75 — 18) ■ 
(A^ + 4)/7 + 10 + 20 • [/) if Age > 75 then Age = 75, if Age < 18 then 
Age = 18. Dnrrth = Tapp - Da^c ■ 365.5, D/„e = ^n^((10000 - 500)/40 ■ 
10 • a6s(A^) + 500), -D/„st = int{Income- abs{N) / A), D spending = int{Income- 
abs{N)/A), Dn,„^^ = mti30 ■ ahs{N)/A + 6) if Nj^st < 6 then N^^.t = 6, 
DNomi = irit{5 ■ abs{N)) and Djnt, = 10-U, for i = 1, 2, 3, 4, if x^;;*^^ < 2 then 
-Ddajys = —int{15 ■ {abs{N)/4:)) else Z^daj/s = int{15 ■ (N/i)), where int{) and 
absO are integer value and absolute value suitable. 

To avoid scale or unit problem for every individual variable it is suggested 
to make a simple standardization step for ABT table for every T^ur before 
score calculation. That idea is quite realistic, because even some customers 
are good payers in the crisis time they can also have more problems, so general 
condition of the current month can infiuence on all customers. On the other 
hand to present interesting two cases is decided to standardize variables by 
the global parameters. 

Scoring formula for Scoreuain is calculated based on the table [H namely: 

28 

S core Main = /3{x-fi)/a. 

index=l 

All beta coefficients could be recalculated without standardization step, 
but in that case it would be more difficult to interpret them. By a simple 
study of the table [T] it can be indicated that the most significant variables 
have absolute value equals 6. 

4.2 The first case study — unstable application charac- 
teristic - APP 

In that case it is assumed that only customers with low income can be influ- 
enced by a crisis. Application characteristic income in that data generator 
is a stable variable during the time, and the migration matrix is adjusted by 
the macro-economic E{m) only for cases: 

•^Income ^ 1800. 

Presented relation without any problem can be transformed into the gen- 
eral form 12.31 
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Table 1: Scoring formula for S core Main- 
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4.3 The second case study — unstable behavioral char- 
acteristic — BEH 

Here the condition for migration matrix adjustment is the following: 

^"dtJ^) ^ ^ '^'seniority > 6, 

the rule for the seniority variable is added to not adjust accounts with missing 
imputation based on 12.11 That case presents situation when crisis has an 
impact on customers who had some delinquency during their last 6 months. 

4.4 Stability problem 

Let be considered the typical scoring models building process, for example 
on behavioral sub-portfolio. Because two cases are based on two variables 
one application and one behavioral let be considered only the set of these 
two variables. To indicate strong instability models they are analyzed with 
the target variable Defaultg. 

Every variable is segmented or binned for a few attributes described in 
the tables [2] and [31 

In the case of unstable application variable (APP) by studying the figure[6] 
can be confirmed, what is expected, that attribute 2 is very stable during the 
time and accounts from that group are not quite sensible for crisis changes. 
In opposite attribute 1 is very unstable. The same groups in the case of 
unstable behavioral variable (BEH) are both unstable, see the figure [71 The 
same group, accounts from attribute 2, are presented on figure [51 for both 
cases to indicate in a better scale that APP case can really choose accounts 
not sensitive on the crisis. Even data generator is simplicity of the real data, 
that conclusion is very useful. Some application data can be profitable in 
risk management to indicate sub-segments with stable risk in the time. 

Not the same conclusions can be formulated for behavioral variable x^^^ (61 

"'due ^ ' 

On the figure [31 there are presented risk evolutions for three attributes of that 
variable. All of them are not stable. The most stable attribute is with the 
number 3. Also for the case BEH that attribute is not stable, see the figure 
[H To be sure of that there are also presented on the figure [2l only attributes 
3 for both cases. Every reader can say that both cases have unstable risk. 
Even in the case BEH the attribute 3 is expected to have a stable risk, due 
to the rule for migration matrix adjustment, expectation has failed. The 
reason comes from the correct understanding of the process. Typical scoring 
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Table 2: Simple binning for two variables in the case APP. 



Characteristic 



Attribute 
number 



Condition 



Bad rate Population 



on Defaultg 



x^. , < 6 

semortty 

(6) > and x'^"* . , > 6 
otherwise 



percent 



Gini 
on Defaultg 



16.77% 
6.48% 
1.07% 



37.09% 
22.49% 
40.42% 



51.34% 



< 1800 
> 1800 



20.11% 
4.72% 



18.32% 
81.68% 



36.29% 



approach is based on the principal idea that historical information up to the 
observation point is able to predict behavior during the outcome period. Up 
to the observation point account did not have any delinquency so the vari- 
able x^^^^iQ) = 0. After that point in the next months account can have due 
installments. It can be adjusted by the macro-economic variable and on the 
end that group can become unstable. 

The mentioned idea is very important for father research of the crisis. It 
should be emphasized that typical scoring methods used on three types of 
sub-portfolios: APP, BEH and COL cannot discover in the correct way the 
rule of crisis adjustment and cannot indicate some sub-segments stable in the 
time. Of course scoring can be also used just like in that paper for prediction 
of migration states; to be very clear, not for default statuses prediction but for 
transition prediction. The best method is probably the survival analysis [13] 
or [13] with time covariates (time dependent variables), where in natural way 
there is indicated the factor of being better or worse payer in the correct time, 
namely in the typical scoring model the factor is considered but only up to 
the observation point. In the survival model however it can be also taken 
into the account after that observation point, so in the more realistic way. 

There are made many other cases of data generators with more complex 
rule for Scorecyde- If there are taken together both types of variables: ap- 
plication and behavioral the case is too complicated and unstable property 
exists everywhere. In that case is not possible to find stable factor. That 
conclusion is also very important for crisis analysis, because it describes the 
nature of crisis: if it is a strong event and it has an impact on both types of 
characteristics behavioral and application - it is and risk management can 
try to find some sub-segments only more stable then others or with maximal 
risk not exceeded the expected boundary. 
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Table 3: Simple binning for two variables in the case BEH. 



Characteristic 



Attribute 
number 



Condition 



Bad rate Population 



on Dcfaultg 



x^. , < 6 

semortty 

(6) > and x'^"* . , > 6 
otherwise 



percent 



Gini 
on Defaultg 



19.49% 
14.04% 
1.74% 



40.05% 
16.52% 
43.43% 



46.54% 



< 1800 
> 1800 



12.09% 
10.09% 



39.49% 
60.51% 



5.04% 



4.5 Various types of risk measures 

Let be defined that crisis is a time where risk is the highest. The most popular 
reporting for risk management is based on bad rates, vintage and flow rates. 
The figured] presents bad rates for three different sub-portfolios application, 
behavioral and collection. There is presented also one flow rate. There is a 
simple conclusion that crisis does not occur in the same time. Some curves 
indicate local maximum of risk earlier than others. The difference in the 
time is significant and can be almost 6 months, so it is very important to 
remember what kind of reports can indicate a crisis as quickly as possible. 
It should be emphasized that bad rates reports present, by the standard 
way, the evaluation of risk by observation points and a crisis time can occur 
between observation point and the end of outcome period. It seems that flow 
rates reports precise the crisis time in better way. 

4.6 Implementation 

All data were prepared by the SAS System [TT] by manual codes written in 
SAS 4GL used units: Base SAS and SAS/STAT. For the case of unstable 
behavioral variable - BEH: Production dataset has 779 993 rows (about 
90MB) and Transaction dataset - 8 969 413 rows (about 400MB). Total 
time of calculation per one case takes about 4 hours. 



5 Conclusions 



Even if data are generated by random-simulated process, which is not 
realistic, the conclusions give the possibility to better understand the nature 
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Figure 1: Risk measures on Defaultg comparison on sub-portfolios: APP, 
BEH and COL and also with one flow rate M23. 



Risk measures on different sub-portfolios 




Year.Month 

|-»-APP -•-BEH -*-COL -•-Flow rate 2-3 



Figure 2: Risk measures on Defaultg on attribute 3 of variable x^^^^i^) for 
two cases APP and BEH. 



Risk measures on attribute 3 of behavioral variable 



2,5% 




0,5% 



Year.Quarter 

I^^APP Attribute 3 ^^BEH Attribute 3 I 



19 



Figure 3: Risk measures on Defaultg on attributes of variable for the 

case APR 





Risk measures on attributes of beiiavioral variable 


20% 

g 

r- 

2 J 

0% 












/ #^ #^ /- .-^ / / 

Year.Quarter 

-♦-Attribute 1 -"-Attribute 2 -*-Attribute 3 



Figure 4: Risk measures on Defaultg on attributes of variable for the 

case BEH. 
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Figure 5: Risk measures on Defaultg on attribute 2 of variable x' 
two cases APP and BEH. 




Figure 6: Risk measures on Defaultg on attributes of variable a^J^come 
case APP. 



Risk measures on attributes of application variabie 
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Figure 7: Risk measures on Defaultg on attributes of variable xf^^^g^^^ for the 
case BEH. 



Risk measures on attributes of application variable 

18% 1 




Year.Quarter 



-Attribute 1 -^Attribute 2 



of the crisis. 

The banking data generator is a new hope for researching to find the 
proving method of comparisons of various credit scoring techniques. It is 
probable that in the future many random generated data will become the 
new repository for testing and comparisons. 

In the first case - unstable application variable like income is possible to 
split portfolio for two parts: stable and unstable during the time. For the 
second case unstable - behavioral characteristic the task is more complicated 
and it is not possible to split in the same way. Some sub-segments can have 
better stability but always they fluctuate. Moreover if a crisis is impacted by 
many factors both from application form customer characteristics and from 
a customer behavioral together it is very difficult to indicate these factors 
and the crisis in reports is everywhere. 

Generated data are very useful for various analysis and researches. There 
are many rows, many bad default statuses, so analyst can make many good 
exercises to improve his experience. 
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